Completing the DRI3 Extension
This week marks a pretty significant milestone for the glorious
DRI3000 future. The first of the two new extensions is complete
and running both full Gnome and KDE desktops.
DRI3 Extension Overview
The DRI3 extension provides facilities for building direct rendering
libraries to work with the X window system. DRI3 provides three basic
mechanisms:
- Open a DRM device.
- Share kernel objects associated with X pixmaps. The direct
rendering client may allocate kernel objects itself and ask the X
server to construct a pixmap referencing them, or the client may
take an existing X pixmap and discover the underlying kernel
object for it.
- Synchronize access to the kernel objects. Within the X server,
Sync Fences are used to serialize access to objects. These Sync
Fences are exposed via file descriptors which the underlying
driver can use to implement synchronization. The current Intel
DRM driver passes a shared page containing a Linux Futex.
Opening the DRM Device
Ideally, the DRM application would be able to just open the graphics
device and start drawing, sending the resulting buffers to the X
server for display. There s work going on to make this possible, but
the current situation has the X server in charge of blessing the
file descriptors used by DRM clients.
DRI2 does this by having the DRM client fetch a magic cookie from
the kernel and pass that to the X server. The cookie is then passed to
the kernel which matches it up with the DRM client and turns on
rendering access for that application.
For DRI3, things are much simpler the DRM client asks the X server
to pass back a file descriptor for the device. The X server opens the
device, does the magic cookie dance all by itself (at least for now),
and then passes the file descriptor back to the application.
DRI3Open
drawable: DRAWABLE
driverType: DRI3DRIVER
provider: PROVIDER
nfd: CARD8
driver: STRING
device: FD
Errors: Drawable, Value, Match
This requests that the X server open the direct rendering
device associated with drawable, driverType and RandR
provider. The provider must support SourceOutput or SourceOffload.
The direct rendering library used to implement the specified
'driverType' is returned in 'driver'. The file
descriptor for the device is returned in 'device'. 'nfd' will
be set to one (this is strictly a convenience for XCB which
otherwise would need request-specific information about how
many file descriptors were associated with this reply).
Sharing Kernel Pixel Buffers
An explicit non-goal of DRI3 is support for sharing buffers that don t
map directly to regular X pixmaps. So, GL ancillary buffers like depth
and stencil just don t apply here.
The shared buffers in DRI3 are regular X pixmaps in the X server. This
provides a few obvious benefits over the DRI2 scheme: In the kernel,
the buffers are referenced by DMA-BUF handles, which provides a nice
driver-independent mechanism.
- Lifetimes are easily managed. Without being associated with a
separate drawable, it s easy to know when to free the Pixmap.
- Regular X requests apply directly. For instance, copying between
buffers can use the core CopyArea request.
To create back- and fake-front- buffers for Windows, the application
creates a kernel buffer, associates a DMA-BUF file descriptor with
that and then sends the fd to the X server with a pixmap ID to create
the associated pixmap. Doing it in this direction avoids a round trip.
DRI3PixmapFromBuffer
pixmap: PIXMAP
drawable: DRAWABLE
size: CARD32
width, height, stride: CARD16
depth, bpp: CARD8
buffer: FD
Errors: Alloc, Drawable, IDChoice, Value, Match
Creates a pixmap for the direct rendering object associated
with 'buffer'. Changes to pixmap will be visible in that
direct rendered object and changes to the direct rendered
object will be visible in the pixmap.
'size' specifies the total size of the buffer bytes. 'width',
'height' describe the geometry (in pixels) of the underlying
buffer. 'stride' specifies the number of bytes per scanline in
the buffer. The pixels within the buffer may not be arranged
in a simple linear fashion, but 'size' will be at least
'height' * 'stride'.
Precisely how any additional information about the buffer is
shared is outside the scope of this extension.
If buffer cannot be used with the screen associated with
drawable, a Match error is returned.
If depth or bpp are not supported by the screen, a Value error
is returned.
To provide for texture-from-pixmap, the application takes the pixmap
ID and passes that to the X server which returns the a file descriptor
for a DMA-BUF which is associated with the underlying kernel buffer.
DRI3BufferFromPixmap
pixmap: PIXMAP
depth: CARD8
size: CARD32
width, height, stride: CARD16
depth, bpp: CARD8
buffer: FD
Errors: Pixmap, Match
Pass back a direct rendering object associated with
pixmap. Changes to pixmap will be visible in that
direct rendered object and changes to the direct rendered
object will be visible in the pixmap.
'size' specifies the total size of the buffer bytes. 'width',
'height' describe the geometry (in pixels) of the underlying
buffer. 'stride' specifies the number of bytes per scanline in
the buffer. The pixels within the buffer may not be arranged
in a simple linear fashion, but 'size' will be at least
'height' * 'stride'.
Precisely how any additional information about the buffer is
shared is outside the scope of this extension.
If buffer cannot be used with the screen associated with
drawable, a Match error is returned.
Tracking Window Size Changes
When Eric Anholt and I first started discussing DRI3, we hoped to
avoid needing to learn about the window size from the X server. The
thought was that the union of all of the viewports specified by the
application would form the bounds of the drawing area. When the window
size changed, we expected the application would change the viewport.
Alas, this simple plan isn t sufficient here a few GL functions
are not limited to the viewport. So, we need to track the actual
window size and monitor changes to it.
DRI2 does this by delivering invalidate events to the application
whenever the current buffer isn t valid; the application discovers
that this event has been delivered and goes to as the X server for the
new buffers. There are a couple of problems with this approach:
- Any outstanding DRM rendering requests will still draw to the
old buffers.
- The Invalidate events must be captured before the application sees
the related ConfigureNotify event so that the GL library can react
appropriately.
The first problem is pretty intractable within DRI2 the application
has no way of knowing whether a frame that it has drawn was delivered
to the correct buffer as the underlying buffer object can change at
any time. DRI3 fixes this by having the application in control of
buffer management; it can easily copy data from the previous back
buffer to the new back buffer synchronized to its own direct
rendering.
The second problem was solved in DRI2 by using the existing Xlib event
hooks; the GL library directly implements the Xlib side of the DRI2
extension and captures the InvalidateBuffers events within that code,
delivering those to the driver code. The problem with this solution is
that Xlib holds the Display structure mutex across this whole mess,
and Mesa must be very careful not to make any Xlib calls during the
invalidate call.
For DRI3, I considered placing the geometry data in a shared memory
buffer, but my future plans for the Present extension led me to want
an X event instead (more about the Present extension in a future
posting).
An X ConfigureNotify event is sufficient for the current
requirements to track window sizes accurately. However, there s no
easy way for the GL library to ensure that ConfigureNotify events will
be delivered to the application other application code may (and
probably will) adjust the window event mask for its own uses. I
considered adding the necessary event mask tracking code within XCB,
but again, knowing that the Present extension would probably need
additional information anyhow, decided to create a new event instead.
Using an event requires that XCB provide some mechanism to capture
those events, keep them from the regular X event stream, and deliver
them to the GL library. A further requirement is that the GL library
be absolutely assured of receiving notification about these events
before the regular event processing within the application will see a
core ConfigureNotify event.
The method I came up with for XCB is fairly specific to my
requirements. The events are always XGE events, and are tagged with
a special event context ID , an XID allocated for this purpose. The
combination of the extension op-code, the event type and this event
context ID are used to split off these events to custom event queues
using the following APIs:
/**
* @brief Listen for a special event
*/
xcb_special_event_t *xcb_register_for_special_event(xcb_connection_t *c,
uint8_t extension,
uint16_t evtype,
uint32_t eid,
uint32_t *stamp);
This creates a special event queue which will contain only events
matching the specified extension/type/event-id triplet.
/**
* @brief Returns the next event from a special queue
*/
xcb_generic_event_t *xcb_check_for_special_event(xcb_connection_t *c,
xcb_special_event_t *se);
This pulls an event from a special event queue. These events will not
appear in the regular X event queue and so applications will never see them.
There s one more piece of magic here the stamp value passed to
xcb
registerfor
specialevent. This pointer refers to a location in
memory which will be incremented every time an event is placed in the
special event queue. The application can cheaply monitor this memory
location for changes and known when to check the queue for events.
Within GL, the value used is the existing dri2 stamp value. That is
checked at the top of the rendering operation; if it has changed, the
drawing buffers will be re-acquired. Part of the buffer acquisition
process is a check for special events related to the window.
For now, I ve placed these events in the DRI3 extension. However, they
will move to the Present extension once that is working.
DRI3SelectInput
eventContext: DRI3EVENTID
window: WINDOW
eventMask: SETofDRI3EVENT
Errors: Window, Value, Match, IDchoice
Selects the set of DRI3 events to be delivered for the
specified window and event context. DRI3SelectInput can
create, modify or delete event contexts. An event context is
associated with a specific window; using an existing event
context with a different window generates a Match error.
If eventContext specifies an existing event context, then if
eventMask is empty, DRI3SelectInput deletes the specified
context, otherwise the specified event context is changed to
select a different set of events.
If eventContext is an unused XID, then if eventMask is empty
no operation is performed. Otherwise, a new event context is
created selecting the specified events.
The events themselves look a lot like a configure notify event:
DRI3ConfigureNotify
type: CARD8 XGE event type (35)
extension: CARD8 DRI3 extension request number
length: CARD16 2
evtype: CARD16 DRI3_ConfigureNotify
eventID: DRI3EVENTID
window: WINDOW
x: INT16
y: INT16
width: CARD16
height: CARD16
off_x: INT16
off_y: INT16
pixmap_width: CARD16
pixmap_height: CARD16
pixmap_flags: CARD32
'x' and 'y' are the parent-relative location of 'window'.
Note that there are a couple of odd additional fields off
x, offy,
pixmap
width, pixmapheight and pixmap_flags are all place-holders for
what I expect to end up in the Present extension. For now, in DRI3,
they should be ignored.
Synchronization
The DRM application needs to know when various X requests related to
its buffers have finished. In particular, when performing a buffer
swap, the client wants to know when that completes, and be able to
block until it has. DRI2 does this by having the application make a
synchronous request from the X server to get the names of the new back
buffer for drawing the next frame. This has two problems:
- The synchronous round trip to the X server isn t free. Other
running applications may cause fairly arbitrary delays in getting
the reply back from the X server.
- Synchronizing with the X server doesn t ensure that GPU operations
are necessarily serialized between the application and the X
server.
What we want is a serialization guarantee between the X server and the
DRM application that operates at the GPU level.
I ve written a couple of times (
dri3k first steps and
Shared Memory Fences) about using X Sync extension Fences
(created by James Jones and Aaron Plattner) for this synchronization
and wanted to get a bit more specific here.
With the X server, a Sync extension Fence is essentially
driver-specific, allowing the hardware design to control how the
actual synchronization is performed. DRI3 creates a way to share the
underlying operating system object by passing a file descriptor from
application to the X server which somehow references that device
object. Both sides of the protocol need to tacitly agree on what it
means.
DRI3FenceFromFD
drawable: DRAWABLE
fence: FENCE
initially-triggered: BOOL
fd: FD
Errors: IDchoice, Drawable
Creates a Sync extension Fence that provides the regular Sync
extension semantics along with a file descriptor that provides
a device-specific mechanism to manipulate the fence directly.
Details about the mechanism used with this file descriptor are
outside the scope of the DRI3 extension.
For the current GEM kernel interface, because all GPU access is
serialized at the kernel API, it s sufficient to serialize access to
the kernel itself to ensure operations are serialized on the GPU. So,
for GEM, I m using a shared memory futex for the DRI3 synchronization
primitive. That does not mean that all GPUs will share this same
mechanism. Eliminate the kernel serialization guarantee and some more
GPU-centric design will be required.
What about Swap Buffers?
None of the above stuff actually gets bits onto the screen. For now,
the GL implementation is simply taking the X pixmap and copying it to
the window at SwapBuffers time. This is sufficient to run
applications, but doesn t provide for all of the fancy swap options,
like limiting to frame rate or optimizing full-screen swaps.
I ve decided to relegate all of that functionality to the
as-yet-unspecified Present extension.
Because the whole goal of DRI3 was to get direct rendered application
contents into X pixmaps, the Present extension will operate on those
X objects directly. This means it will also be usable with non-DRM
applications that use simple X pixmap based double buffering, a class
which includes most existing non-GL based Gtk+ and Qt applications.
So, I get to reduce the size of the DRI3 extension while providing
additional functionality for non direct-rendered applications.
Current Status
As I said above, all of the above functionality is running on my
systems and has booted both complete KDE and Gnome sessions. There
have been some recent DMA-BUF related fixes in the kernel, so you ll
need to run the latest 3.9.x stable release or a 3.10 release
candidate.
Here s references to all of the appropriate git repositories:
DRI3 protocol and spec:
git://people.freedesktop.org/~keithp/dri3proto master
XCB protocol
git://people.freedesktop.org/~keithp/xcb/proto.git dri3
XCB library
git://people.freedesktop.org/~keithp/xcb/libxcb.git dri3
xshmfence library:
git://people.freedesktop.org/~keithp/libxshmfence.git master
X server:
git://people.freedesktop.org/~keithp/xserver.git dri3
Mesa:
git://people.freedesktop.org/~keithp/mesa.git dri3
Next Steps
Now it s time to go write the Present extension and get that
working. I ll start coding and should have another posting here next
week.